Extra large vocabulary continuous speech recognition algorithm based on information retrieval

نویسنده

  • Valeriy Pylypenko
چکیده

This paper presents a new two-pass algorithm for Extra Large (more than 1M words) Vocabulary COntinuous Speech recognition based on the Information Retrieval (ELVIRCOS). The principle of this approach is to decompose a recognition process into two passes where the first pass builds the words subset for the second pass recognition by using information retrieval procedure. Word graph composition for continuous speech is presented. With this approach a high performances for large vocabulary speech recognition can be obtained.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two-pass Algorithm for Large Vocabulary Continuous Speech Recognition

This paper presents a two-pass algorithm for Extra Large (more than 1M words) Vocabulary COntinuous Speech recognition based on the Information Retrieval (ELVIRCOS). The principle of this approach is to decompose a recognition process into two passes where the first pass builds the word subset for the second pass recognition by using information retrieval procedure. Word graph composition for c...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

PHAST: Spoken Document Retrieval Based on Sequence Alignment

This paper presents a new approach to spoken document information retrieval for spontaneous speech corpora. Classical approach to this problem is the use of an automatic speech recognizer (ASR) combined with standard information retrieval techniques, based on terms or n-grams. However, state-of-the-art large vocabulary continuous ASRs produce transcripts of spontaneous speech with a word error ...

متن کامل

Combination of Multiple Speech Transcription Methods for Vocabulary Independent Search

Today, most systems use large vocabulary continuous speech recognition tools to produce word transcripts which have indexed transcripts and query terms retrieved from the index. However, query terms that are not part of the recognizer’s vocabulary cannot be retrieved, thereby affecting the recall of the search. Such terms can be retrieved using phonetic search methods. Phonetic transcripts can ...

متن کامل

A speaker adaptation algorithm using principal curves in noisy environments

A new speaker adaptation method of speech recognition is proposed in this paper utilizing principal curves algorithm. The key feature of this method is the construction of a transformation function based on the correlation information between observations of different acoustic states. This is an important a priori information crucial to improving system’s recognition performance. Herein the rel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007